Informative Variables Selection for Multi-relational Supervised Learning
نویسندگان
چکیده
In multi-relational data mining, data are represented in a relational form where the individuals of the target table are potentially related to several records in secondary tables in one-to-many relationship. To cope with this one-to-many setting, most of the existing approaches try to transform the multi-table representation, namely by propositionalisation, thereby losing the naturally compact initial representation and eventually introducing statistical bias. Our approach aims to directly evaluate the informativness of the original input variables over the relational domain w.r.t. the target variable. The idea is to summarize for each individual the information contained in the non target table variable by a features tuple representing the cardinalities of the initial modalities. Multivariate grid models have been used to qualify the joint information brought by the new features, which is equivalent to estimate the conditional density of the target variable given the input variable in non target table. Preliminary experiments on artificial and real data sets show that the approach allows to potentially identify relevant one-tomany variables. In this article, we focus on binary variables because of space constraints.
منابع مشابه
Itemset-Based Variable Construction in Multi-relational Supervised Learning
In multi-relational data mining, data are represented in a relational form where the individuals of the target table are potentially related to several records in secondary tables in one-to-many relationship. In this paper, we introduce an itemset based framework for constructing variables in secondary tables and evaluating their conditional information for the supervised classification task. W...
متن کاملExploring the Gap Between Variable Selection and Dimensionality Reduction
The Problem: This project addresses the gap between variable selection algorithms and dimensionality reduction algorithms. Variable selection algorithms are designed to produce sparse solutions where only few variable are marked as relevant variables. This is not suitable for highly correlated data such as gray values of an image. Dimensionality reduction algorithms (e.g PCA) tend to combine al...
متن کاملLocally Consistent Bayesian Network Scores for Multi-Relational Data
An important task for relational learning is Bayesian network (BN) structure learning. A fundamental component of structure learning is a model selection score that measures how well a model fits a dataset. We describe a new method that upgrades for multi-relational databases, a loglinear BN score designed for single-table i.i.d. data. Chickering and Meek showed that for i.i.d. data, standard B...
متن کاملTowards Automatic Feature Construction for Supervised Classification
We suggest an approach to automate variable construction for supervised learning, especially in the multi-relational setting. Domain knowledge is specified by describing the structure of data by the means of variables, tables and links across tables, and choosing construction rules. The space of variables that can be constructed is virtually infinite, which raises both combinatorial and over-fi...
متن کاملEvolutionary Approaches to the Learning of Fuzzy Rule- Based Classification Systems
The learning of a Fuzzy Rule-Based Classification System (FRBCS) by means of a supervised inductive process fundamentally implies four tasks that are complementary among them: the selection of the most informative variables to the classification problem to solve, the generation of a set of rules, the selection of the subset of rules with the best co-operation and the least redundancy, and the e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011